Irish-Weather Dataset-2017 year Analysis

This work is produced for academic purposes only.

This report considered the Irish Weather Data from Ireland’s Weather Observing Stations in 2017 and includes energy generation data for 2017 in the Tibble eirgrid17.

Copyright is associated with Met Éireann and EirGrid.

The Source of the dataset used in this report are fetched from www.met.ie, http://www.eirgridgroup.com/how-the-grid-works/renewables/

This data is published under a Creative Commons Attribution 4.0 International (CC BY 4.0)

Met Éireann does not accept any liability whatsoever for any error or omission in the data, their availability, or for any loss or damage arising from their use.


Project Background Information

The following is the introduction of the Dataset:

aimsir17: Irish Weather Observing Stations Hourly Records for 2017

Named after the Irish name for weather, this package contains tidied data from the Irish Meteorological Service’s hourly observations for 2017. In all, the data sets include observations from 25 weather stations, and its latitude and longitude coordinates for each weather station. Also, it includes energy generation data for Ireland and Northern Ireland (2017), including Wind Generation data.

There are three datasets in this package.

  • eirgrid17 - Description: EirGrid System Data Quarterly Hourly for 2017

  • observations - Description: Weather Observing Stations Records 01-Jan-2017 to 31-Dec-2017

  • stations - Description: Summary of the weather observing stations with observations

We will look at a few specific events and derive insights from the given data sets to know more about Ireland’s weather in the year 2017.

The data sets contain observations from 25 weather stations positioned across Ireland, with their Latitude and longitude coordinates for each weather station. It stores the hourly records from January-December based on the following weather observation stations. They have used the Irish Meteorological Service to collect the data sets from stations placed across Ireland.

We have merged the different datasets into one using “stations” as the common column, where the newly generated single dataset has column values of both data sets.

There were less than 0.1% duplicates present in the given dataset and similarly, there were ~ 1% of NAs in the data.frame named “observations”. We handled it by performing certain data cleaning activities. The variables found to have the appropriate type and the structure of the data sets were aligned with the project, not requiring any kind of data manipulation activities.

Packages Required

# Loading libraries into our Notebook
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ stringr 1.4.0
## ✓ tidyr   1.1.4     ✓ forcats 0.5.1
## ✓ readr   2.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(aimsir17)
library(ggplot2)
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
##       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(xts)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
library(dplyr)
library(dygraphs)

Data Cleaning and Preparation

# Looking out for NA's
print(mean(is.na(observations))*100) # Total percentage of NA's in Observations
## [1] 1.360122
print(mean(is.na(stations))*100) # Total percentage of NA's in Stations
## [1] 0

Variable selection and data merging

# Generating new data.frame by merging OS and stations data into one data.frame
os<- merge(observations, stations, by="station")

# Checking for NA's
mean(is.na(os))*100
## [1] 1.020091
# Assigning 0 in the place for NA
os[is.na(os)] = 0
mean(is.na(os))*100
## [1] 0
head(os)
##   station year month day hour                date rain temp rhum    msl wdsp
## 1 ATHENRY 2017     1   1    0 2017-01-01 00:00:00  0.0  5.2   89 1021.9    8
## 2 ATHENRY 2017     1   1    1 2017-01-01 01:00:00  0.0  4.7   89 1022.0    9
## 3 ATHENRY 2017     1   1    2 2017-01-01 02:00:00  0.0  4.2   90 1022.1    8
## 4 ATHENRY 2017     1   1    3 2017-01-01 03:00:00  0.1  3.5   87 1022.5    9
## 5 ATHENRY 2017     1   1    4 2017-01-01 04:00:00  0.1  3.2   89 1022.7    8
## 6 ATHENRY 2017     1   1    5 2017-01-01 05:00:00  0.0  2.1   91 1023.3    8
##   wddir county height latitude longitude
## 1   320 Galway     40   53.289    -8.786
## 2   320 Galway     40   53.289    -8.786
## 3   320 Galway     40   53.289    -8.786
## 4   330 Galway     40   53.289    -8.786
## 5   330 Galway     40   53.289    -8.786
## 6   330 Galway     40   53.289    -8.786

Data Summary

We have three data sets. 1) Observations 2) Stations 3) Eirgrid

head(os)
##   station year month day hour                date rain temp rhum    msl wdsp
## 1 ATHENRY 2017     1   1    0 2017-01-01 00:00:00  0.0  5.2   89 1021.9    8
## 2 ATHENRY 2017     1   1    1 2017-01-01 01:00:00  0.0  4.7   89 1022.0    9
## 3 ATHENRY 2017     1   1    2 2017-01-01 02:00:00  0.0  4.2   90 1022.1    8
## 4 ATHENRY 2017     1   1    3 2017-01-01 03:00:00  0.1  3.5   87 1022.5    9
## 5 ATHENRY 2017     1   1    4 2017-01-01 04:00:00  0.1  3.2   89 1022.7    8
## 6 ATHENRY 2017     1   1    5 2017-01-01 05:00:00  0.0  2.1   91 1023.3    8
##   wddir county height latitude longitude
## 1   320 Galway     40   53.289    -8.786
## 2   320 Galway     40   53.289    -8.786
## 3   320 Galway     40   53.289    -8.786
## 4   330 Galway     40   53.289    -8.786
## 5   330 Galway     40   53.289    -8.786
## 6   330 Galway     40   53.289    -8.786
glimpse(os)
## Rows: 219,000
## Columns: 16
## $ station   <chr> "ATHENRY", "ATHENRY", "ATHENRY", "ATHENRY", "ATHENRY", "ATHE…
## $ year      <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, …
## $ month     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ day       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ hour      <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
## $ date      <dttm> 2017-01-01 00:00:00, 2017-01-01 01:00:00, 2017-01-01 02:00:…
## $ rain      <dbl> 0.0, 0.0, 0.0, 0.1, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, …
## $ temp      <dbl> 5.2, 4.7, 4.2, 3.5, 3.2, 2.1, 2.0, 1.7, 1.0, 1.1, 3.0, 4.3, …
## $ rhum      <dbl> 89, 89, 90, 87, 89, 91, 89, 89, 91, 91, 84, 78, 75, 72, 72, …
## $ msl       <dbl> 1021.9, 1022.0, 1022.1, 1022.5, 1022.7, 1023.3, 1023.5, 1024…
## $ wdsp      <dbl> 8, 9, 8, 9, 8, 8, 7, 7, 7, 8, 9, 12, 11, 12, 11, 11, 11, 6, …
## $ wddir     <dbl> 320, 320, 320, 330, 330, 330, 330, 340, 330, 330, 320, 350, …
## $ county    <chr> "Galway", "Galway", "Galway", "Galway", "Galway", "Galway", …
## $ height    <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, …
## $ latitude  <dbl> 53.289, 53.289, 53.289, 53.289, 53.289, 53.289, 53.289, 53.2…
## $ longitude <dbl> -8.786, -8.786, -8.786, -8.786, -8.786, -8.786, -8.786, -8.7…

The following variables are recored for each observation:

  • station (weather station name)
  • year (2017)
  • month (month of year 1-12)
  • day (day of month 1-31)
  • date (date object in R)
  • rain (hourly rainfall in mm)
  • temp (hourly temperature in C)
  • rhum (Relative Humidity - percent)
  • msl (Mean Sea Level Pressure - hPa)
  • wdsp (Mean Wind Speed - knots)
  • wddir (Predominant Wind Direction - degree)
summary(os)
##    station               year          month             day       
##  Length:219000      Min.   :2017   Min.   : 1.000   Min.   : 1.00  
##  Class :character   1st Qu.:2017   1st Qu.: 4.000   1st Qu.: 8.00  
##  Mode  :character   Median :2017   Median : 7.000   Median :16.00  
##                     Mean   :2017   Mean   : 6.526   Mean   :15.72  
##                     3rd Qu.:2017   3rd Qu.:10.000   3rd Qu.:23.00  
##                     Max.   :2017   Max.   :12.000   Max.   :31.00  
##       hour            date                          rain        
##  Min.   : 0.00   Min.   :2017-01-01 00:00:00   Min.   : 0.0000  
##  1st Qu.: 5.75   1st Qu.:2017-04-02 05:45:00   1st Qu.: 0.0000  
##  Median :11.50   Median :2017-07-02 11:30:00   Median : 0.0000  
##  Mean   :11.50   Mean   :2017-07-02 11:30:00   Mean   : 0.1226  
##  3rd Qu.:17.25   3rd Qu.:2017-10-01 17:15:00   3rd Qu.: 0.0000  
##  Max.   :23.00   Max.   :2017-12-31 23:00:00   Max.   :16.6000  
##       temp            rhum            msl            wdsp       
##  Min.   :-6.20   Min.   :  0.0   Min.   :   0   Min.   : 0.000  
##  1st Qu.: 7.50   1st Qu.: 77.0   1st Qu.:1007   1st Qu.: 4.000  
##  Median :10.60   Median : 86.0   Median :1016   Median : 8.000  
##  Mean   :10.29   Mean   : 83.6   Mean   :1014   Mean   : 8.671  
##  3rd Qu.:13.40   3rd Qu.: 93.0   3rd Qu.:1022   3rd Qu.:12.000  
##  Max.   :28.30   Max.   :100.0   Max.   :1039   Max.   :59.000  
##      wddir          county              height          latitude    
##  Min.   :  0.0   Length:219000      Min.   :  9.00   Min.   :51.48  
##  1st Qu.:140.0   Class :character   1st Qu.: 24.00   1st Qu.:52.69  
##  Median :220.0   Mode  :character   Median : 46.00   Median :53.36  
##  Mean   :196.2                      Mean   : 58.36   Mean   :53.26  
##  3rd Qu.:270.0                      3rd Qu.: 75.00   3rd Qu.:53.91  
##  Max.   :360.0                      Max.   :201.00   Max.   :55.37  
##    longitude      
##  Min.   :-10.241  
##  1st Qu.: -8.918  
##  Median : -8.244  
##  Mean   : -8.138  
##  3rd Qu.: -7.310  
##  Max.   : -6.241
head(stations)
## # A tibble: 6 × 5
##   station      county height latitude longitude
##   <chr>        <chr>   <dbl>    <dbl>     <dbl>
## 1 ATHENRY      Galway     40     53.3     -8.79
## 2 BALLYHAISE   Cavan      78     54.1     -7.31
## 3 BELMULLET    Mayo        9     54.2    -10.0 
## 4 CASEMENT     Dublin     91     53.3     -6.44
## 5 CLAREMORRIS  Mayo       68     53.7     -8.99
## 6 CORK AIRPORT Cork      155     51.8     -8.49
glimpse(stations)
## Rows: 25
## Columns: 5
## $ station   <chr> "ATHENRY", "BALLYHAISE", "BELMULLET", "CASEMENT", "CLAREMORR…
## $ county    <chr> "Galway", "Cavan", "Mayo", "Dublin", "Mayo", "Cork", "Dublin…
## $ height    <dbl> 40, 78, 9, 91, 68, 155, 71, 83, 33, 75, 62, 201, 21, 20, 34,…
## $ latitude  <dbl> 53.289, 54.051, 54.228, 53.306, 53.711, 51.847, 53.428, 53.5…
## $ longitude <dbl> -8.786, -7.310, -10.007, -6.439, -8.993, -8.486, -6.241, -6.…

The following variables are recored for each stations:

  • station (weather station name)
  • County (The county location of the station)
  • Height (Height of the stations)
  • Longitude and Latitude (The exact locaiton of the stations)
summary(stations)
##    station             county              height          latitude    
##  Length:25          Length:25          Min.   :  9.00   Min.   :51.48  
##  Class :character   Class :character   1st Qu.: 24.00   1st Qu.:52.69  
##  Mode  :character   Mode  :character   Median : 46.00   Median :53.36  
##                                        Mean   : 58.36   Mean   :53.26  
##                                        3rd Qu.: 75.00   3rd Qu.:53.91  
##                                        Max.   :201.00   Max.   :55.37  
##    longitude      
##  Min.   :-10.241  
##  1st Qu.: -8.918  
##  Median : -8.244  
##  Mean   : -8.138  
##  3rd Qu.: -7.310  
##  Max.   : -6.241
head(eirgrid17)
## # A tibble: 6 × 15
##    year month   day  hour minute date                NIGeneration NIDemand
##   <dbl> <dbl> <int> <int>  <int> <dttm>                     <dbl>    <dbl>
## 1  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
## 2  2017     1     1     0     15 2017-01-01 00:15:00         922.     770.
## 3  2017     1     1     0     30 2017-01-01 00:30:00         908.     761.
## 4  2017     1     1     0     45 2017-01-01 00:45:00         919.     743.
## 5  2017     1     1     1      0 2017-01-01 01:00:00         882.     749.
## 6  2017     1     1     1     15 2017-01-01 01:15:00         849.     742.
## # … with 7 more variables: NIWindAvailability <dbl>, NIWindGeneration <dbl>,
## #   IEGeneration <dbl>, IEDemand <dbl>, IEWindAvailability <dbl>,
## #   IEWindGeneration <dbl>, SNSP <chr>
glimpse(eirgrid17)
## Rows: 35,040
## Columns: 15
## $ year               <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 201…
## $ month              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ day                <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ hour               <int> 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, …
## $ minute             <int> 0, 15, 30, 45, 0, 15, 30, 45, 0, 15, 30, 45, 0, 15,…
## $ date               <dttm> 2017-01-01 00:00:00, 2017-01-01 00:15:00, 2017-01-…
## $ NIGeneration       <dbl> 889.005, 922.234, 908.122, 918.802, 882.441, 848.86…
## $ NIDemand           <dbl> 775.931, 770.233, 761.186, 742.718, 749.238, 742.45…
## $ NIWindAvailability <dbl> 175.065, 182.866, 169.796, 167.501, 174.094, 189.92…
## $ NIWindGeneration   <dbl> 198.202, 207.765, 193.103, 190.757, 195.790, 212.95…
## $ IEGeneration       <dbl> 3288.57, 3282.12, 3224.27, 3171.27, 3190.28, 3184.6…
## $ IEDemand           <dbl> 2921.44, 2884.19, 2806.38, 2718.77, 2682.91, 2649.8…
## $ IEWindAvailability <dbl> 1064.79, 965.60, 915.35, 895.38, 1028.03, 1144.17, …
## $ IEWindGeneration   <dbl> 1044.72, 957.74, 900.46, 870.81, 998.31, 1119.12, 1…
## $ SNSP               <chr> "28.4%", "26.4%", "25.2%", "24.7%", "27.9%", "31.4%…

The following variables are recorded for each eirgrid17:

  • year (2017)
  • month (month of year 1-12)
  • day (day of month 1-31)
  • hour (hour of the day)
  • date (date object in R)
  • NIGeneration (Northern Ireland energy generation)
  • NIDemand (Northern Ireland energy demand)
  • NIWindAvailability (Northern Ireland Wind Availability)
  • NIWindGeneration (Northern Ireland Wind Generation)
  • IEGeneration (Ireland energy generation)
  • IEDemand (Ireland enegry Demand)
  • IEWindavailablitiy (Ireland Wind Availability)
  • IEWindGeneration (Ireland Wind Generation)
  • SNSP (System Non-Synchronous Penetration, it is a real-time measure of the percentage of generation that comes from non-synchronous sources)
summary(eirgrid17)
##       year          month             day             hour      
##  Min.   :2017   Min.   : 1.000   Min.   : 1.00   Min.   : 0.00  
##  1st Qu.:2017   1st Qu.: 4.000   1st Qu.: 8.00   1st Qu.: 5.75  
##  Median :2017   Median : 7.000   Median :16.00   Median :11.50  
##  Mean   :2017   Mean   : 6.527   Mean   :15.72   Mean   :11.50  
##  3rd Qu.:2017   3rd Qu.:10.000   3rd Qu.:23.00   3rd Qu.:17.25  
##  Max.   :2017   Max.   :12.000   Max.   :31.00   Max.   :23.00  
##      minute           date                      NIGeneration   
##  Min.   : 0.00   Min.   :2017-01-01 00:00:00   Min.   : 375.9  
##  1st Qu.:11.25   1st Qu.:2017-04-02 06:56:15   1st Qu.: 832.6  
##  Median :22.50   Median :2017-07-02 12:52:30   Median : 934.1  
##  Mean   :22.50   Mean   :2017-07-02 12:28:10   Mean   : 942.6  
##  3rd Qu.:33.75   3rd Qu.:2017-10-01 18:48:45   3rd Qu.:1045.4  
##  Max.   :45.00   Max.   :2017-12-31 23:45:00   Max.   :1538.5  
##     NIDemand      NIWindAvailability NIWindGeneration  IEGeneration 
##  Min.   : 456.7   Min.   :  0.97     Min.   :  0.00   Min.   :1860  
##  1st Qu.: 712.5   1st Qu.: 78.59     1st Qu.: 73.12   1st Qu.:2972  
##  Median : 940.9   Median :199.78     Median :190.88   Median :3273  
##  Mean   : 926.9   Mean   :251.05     Mean   :232.18   Mean   :3272  
##  3rd Qu.:1097.6   3rd Qu.:390.97     3rd Qu.:365.40   3rd Qu.:3549  
##  Max.   :1627.5   Max.   :846.02     Max.   :831.18   Max.   :4772  
##     IEDemand    IEWindAvailability IEWindGeneration     SNSP          
##  Min.   :1932   Min.   :   5.86    Min.   :   0.0   Length:35040      
##  1st Qu.:2648   1st Qu.: 332.29    1st Qu.: 313.3   Class :character  
##  Median :3236   Median : 748.40    Median : 713.4   Mode  :character  
##  Mean   :3167   Mean   : 888.56    Mean   : 825.1                     
##  3rd Qu.:3606   3rd Qu.:1327.42    3rd Qu.:1251.9                     
##  Max.   :4940   Max.   :2707.19    Max.   :2615.6

Rainfall Analysis

Maximum Rainy months

Let’s look at the months with maximum rainfall in 2017. This will help us understand which month have high possibility to rain.

# Max Rainy months, with an index value where 100 is the wettest
os %>% 
  group_by(month.name[month]) %>%
  summarise(rainMonth=sum(rain)) %>%
  arrange(rainMonth) %>%
  mutate(Index=100*rainMonth/max(rainMonth)) %>%
  print(n=25)
## # A tibble: 12 × 3
##    `month.name[month]` rainMonth Index
##    <chr>                   <dbl> <dbl>
##  1 April                    506.  16.0
##  2 January                 1466.  46.5
##  3 May                     1543.  49.0
##  4 February                2079.  66.0
##  5 July                    2216.  70.3
##  6 November                2409   76.4
##  7 June                    2440.  77.4
##  8 August                  2536   80.5
##  9 October                 2648   84.0
## 10 March                   2754.  87.4
## 11 December                3098.  98.3
## 12 September               3151. 100
#Observation: September has the highest rainfall, as seen above.

Rainfall/Month including date time

Let’s look at the months with maximum rainfall in 2017. This will help us understand which month have high possibility to rain.

dateto <- ymd_hms(os$date)
don <- xts(x = os$rain, order.by = dateto)
dygraph(don) %>%
  dyOptions(labelsUTC = TRUE, fillGraph=TRUE, fillAlpha=0.1, drawGrid = FALSE, colors="#D8AE5A") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = FALSE)  %>%
  dyRoller(rollPeriod = 1)

Temperature Analysis

Temperature/Month

Let’s look at the temperature months wise such as maximum,minimum,mean . This will help us understand temperature of each month and to identify the seasons.

ggplot(os,aes(x = month.name[month],y = temp)) + 
  geom_point(aes(colour = temp)) +
  scale_colour_gradient2(low = "blue", mid = "green" , high = "red", midpoint = 16) + 
  geom_smooth(color = "red",size = 1) +
  scale_y_continuous(limits = c(5,30), breaks = seq(5,30,5)) +
  ggtitle ("Daily average temperature") +
  xlab("Date") +  ylab ("Average Temperature ( ºC )")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 28517 rows containing non-finite values (stat_smooth).
## Warning: Removed 28517 rows containing missing values (geom_point).

#observation:
#Winter  - November, December, January [which had the lowest temperature]
#Summer  - May, June, July [Which had highest temperature]

** Hourly average temperature **

Let’s look at the average temperature hour wise such . This will help us understand temperature of each hour in a month

ourly_mean<- os %>%
group_by(month,date,hour) %>%
summarize(tem_avg = mean(temp))
## `summarise()` has grouped output by 'month', 'date'. You can override using the `.groups` argument.
ourly_mean
## # A tibble: 8,760 × 4
## # Groups:   month, date [8,760]
##    month date                 hour tem_avg
##    <dbl> <dttm>              <int>   <dbl>
##  1     1 2017-01-01 00:00:00     0    6.01
##  2     1 2017-01-01 01:00:00     1    5.23
##  3     1 2017-01-01 02:00:00     2    4.68
##  4     1 2017-01-01 03:00:00     3    4.4 
##  5     1 2017-01-01 04:00:00     4    4.05
##  6     1 2017-01-01 05:00:00     5    3.6 
##  7     1 2017-01-01 06:00:00     6    3.14
##  8     1 2017-01-01 07:00:00     7    3.02
##  9     1 2017-01-01 08:00:00     8    2.94
## 10     1 2017-01-01 09:00:00     9    3.15
## # … with 8,750 more rows
zzZ <- ggplot(ourly_mean,aes(x = hour,y = tem_avg)) +
  geom_point(colour = "#FF66FF" ,size = 0.05) +
  geom_smooth(colour = "#66CCFF",size = 0.25) +
  scale_y_continuous(limits = c(5,23)) +
  ggtitle ("Hourly average temperature") +
  xlab("Hour") +  ylab ("Average Temperature ( ºC )")
          

# create faceted panel
zzZ + facet_wrap(~month.name[month])
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1145 rows containing non-finite values (stat_smooth).
## Warning: Removed 1145 rows containing missing values (geom_point).

#observations:
#There is a significant rise and fall in temperature for  few month like august may june july september 
#During the months of december january february november october the temperature remanins constant within a range 

Total-Min-Max of Rain,Temperature

Let’s look at the temperature months wise such as maximum,minimum,mean . This will help us understand temperature of each month and to identify the seasons.

#Total min max rain,temperature
max_min <- os %>% 
  group_by(month=month.name[month]) %>%
  summarise(min_temp=min(temp,na.rm = T),
            max_temp=max(temp,na.rm = T),
            Mean_temp=mean(temp,na.rm = T),
            min_rain=min(rain,na.rm = T),
            Max_Rain=mean(rain,na.rm = T))

max_min
## # A tibble: 12 × 6
##    month     min_temp max_temp Mean_temp min_rain Max_Rain
##    <chr>        <dbl>    <dbl>     <dbl>    <dbl>    <dbl>
##  1 April         -1.9     18.6      9.01        0   0.0281
##  2 August         4.3     22.9     14.3         0   0.136 
##  3 December      -6.2     14        6.16        0   0.167 
##  4 February      -4.9     14.7      6.62        0   0.124 
##  5 January       -5.6     12.8      6.48        0   0.0788
##  6 July           3.1     25.5     14.7         0   0.119 
##  7 June           3.6     28.3     13.9         0   0.136 
##  8 March         -3       18.8      8.12        0   0.148 
##  9 May           -0.9     25.7     12.3         0   0.0830
## 10 November      -3.4     16.3      7.42        0   0.134 
## 11 October        0.2     19.7     11.7         0   0.142 
## 12 September      1.2     23.1     12.6         0   0.175
#observation:
#Minimum temperature month:December
#Maximum temperature month:June
#Maximum Rainy month:September

Annual mean rainfall,temperature

Let’s look at the temperature,rainfall annualy.This will help us understand amount of rainfall and degree of temperature per year.Which will help to detect the annual report

#Annual mean rainfall
mean( os$rain, FUN=sum, na.rm=TRUE) 
## [1] 0.1225785
#Annual mean temperature
mean( os$temp, FUN=sum, na.rm=TRUE)
## [1] 10.29018
#observation:
#Annual mean temperature in the year 2017 in Ireland is 10.290
#Annual rainfall in Ireland during 2017 is 0.122

Humidity Analysis

Correlation between humidity and rain

Let’s look at the humidity when there is rainfall in 2017. This will help us predict the possibility to rain according to the humidity.

# Filtering out the Dublin Airport data
DUBLIN_AIRPORT<- filter(os, os$station == "DUBLIN AIRPORT")
head(DUBLIN_AIRPORT)
##          station year month day hour                date rain temp rhum    msl
## 1 DUBLIN AIRPORT 2017     1   1    0 2017-01-01 00:00:00  0.9  5.3   91 1019.9
## 2 DUBLIN AIRPORT 2017     1   1    1 2017-01-01 01:00:00  0.2  4.9   95 1019.7
## 3 DUBLIN AIRPORT 2017     1   1    2 2017-01-01 02:00:00  0.1  5.0   92 1019.8
## 4 DUBLIN AIRPORT 2017     1   1    3 2017-01-01 03:00:00  0.0  4.2   90 1020.2
## 5 DUBLIN AIRPORT 2017     1   1    4 2017-01-01 04:00:00  0.0  3.6   88 1020.2
## 6 DUBLIN AIRPORT 2017     1   1    5 2017-01-01 05:00:00  0.0  2.8   89 1020.4
##   wdsp wddir county height latitude longitude
## 1   12   340 Dublin     71   53.428    -6.241
## 2    8   310 Dublin     71   53.428    -6.241
## 3    8   310 Dublin     71   53.428    -6.241
## 4   12   330 Dublin     71   53.428    -6.241
## 5   11   330 Dublin     71   53.428    -6.241
## 6   12   330 Dublin     71   53.428    -6.241
# H0 = Rain increases as rhum rises
# HA != Rain does not increases as rhum rises
# A basic scatter plot with color depending on the Temperature
ggplot(DUBLIN_AIRPORT, aes(y=rain, x=rhum, color=temp)) + 
  geom_point(size=6) +
  ggtitle ("Rain vs Rhum fr Dublin Airport") +
  xlab("Rhum") +  ylab ("Rain")

Correlation between humidity and temperature

Let’s look at the temperature when there is rainfall in 2017. This will help us predict the possibility to rain according to the humidity.

# H0 = Rhum falls as Temp rises
# HA != Rhum does not falls as Temp rises
# It proves H0.
ggplot(DUBLIN_AIRPORT, aes(x=temp, y=rhum, color=temp)) + 
  geom_point(size=6) +
  theme_ipsum()+
  ggtitle ("Temperature vs Rhum fr Dublin Airport") +
  xlab("Temperature") +  ylab ("Rhum")

#observation:humidity decreases as temperature increases

Station-wise Analysis

Rainfall per stations

Let’s look at the heighest peak station which help us to understand whether the rainfall is corelated with the height above sea level. # The weather stations from driest to wettest, with an index value where 100 is the wettest # As we can see that the Newport is the wettest station.

h<-os %>% 
  group_by(station) %>%
  summarise(TotalRainfall=sum(rain,na.rm = T)) %>%
  arrange(desc(TotalRainfall)) %>%
  mutate(Index=100*TotalRainfall/max(TotalRainfall)) %>%
  print(n=25)
## # A tibble: 25 × 3
##    station              TotalRainfall Index
##    <chr>                        <dbl> <dbl>
##  1 NEWPORT                      1752. 100  
##  2 VALENTIA OBSERVATORY         1598.  91.2
##  3 KNOCK AIRPORT                1343.  76.7
##  4 BELMULLET                    1243.  71.0
##  5 FINNER                       1222.  69.7
##  6 CLAREMORRIS                  1204   68.7
##  7 ATHENRY                      1199.  68.5
##  8 MARKREE                      1182.  67.5
##  9 CORK AIRPORT                 1162.  66.3
## 10 MALIN HEAD                   1147.  65.5
## 11 MACE HEAD                    1114.  63.6
## 12 SherkinIsland                1072.  61.2
## 13 SHANNON AIRPORT              1069.  61.0
## 14 MOORE PARK                   1016.  58.0
## 15 ROCHES POINT                 1013.  57.8
## 16 MT DILLON                     992.  56.6
## 17 GURTEEN                       983.  56.1
## 18 JOHNSTOWNII                   963   55.0
## 19 MULLINGAR                     952.  54.4
## 20 BALLYHAISE                    952.  54.4
## 21 DUNSANY                       810.  46.2
## 22 OAK PARK                      759.  43.3
## 23 PHOENIX PARK                  732   41.8
## 24 CASEMENT                      705.  40.2
## 25 DUBLIN AIRPORT                662.  37.8

Highest peak station

Heighest_station <- stations %>% arrange(desc(height)) %>% 
  slice(1:4) %>% 
  pull(station) %>%
  unique()
Heighest_station
## [1] "KNOCK AIRPORT" "CORK AIRPORT"  "MULLINGAR"     "CASEMENT"
ggplot(filter(stations,station %in% Heighest_station),aes(x=height,y=station,colour=station))+
geom_bar(stat="identity")+geom_line()
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

ggplot(os,aes(y=station,x=height,colour=station))+
  geom_bar(stat="identity")+geom_line()

Newport -> Individual Station-wise Analysis

Working on the New Port Station, I have took data based on the station NEWPORT for further Station-wise Analysis.

# Working on the New Port Station
new_port<- filter(os, os$station == "NEWPORT")

visualization of correlations

The correlation graph shows correlation present between different paramters of Corr

library(GGally)

# visualization of correlations
ggcorr(new_port, method = c("everything", "pearson"))
## Warning in ggcorr(new_port, method = c("everything", "pearson")): data in
## column(s) 'station', 'date', 'county' are not numeric and were ignored
## Warning in cor(data, use = method[1], method = method[2]): the standard
## deviation is zero

Wetties month for NewPort throughout the year 2017

From This graph we can see that as the temperature rises (blocks in dark blue) the rain also rises, and vice versa.

# 
ggplot(new_port, aes(x=month, y=rain, fill=month, label=temp)) +
  geom_raster()+
  ggtitle ("Temperature vs R-Humiditity for NEWPORT Station") +
  xlab("Temperature") +  ylab ("Rhum")
## Warning: Raster pixels are placed at uneven vertical intervals and will be
## shifted. Consider using geom_tile() instead.

A basic scatter plot with color depending on Temperature

# A basic scatter plot with color depending on Temperature
ggplot(new_port, aes(x=wdsp, y=rain, color=temp)) + 
  geom_point(size=6) +
  theme_ipsum()+
  ggtitle ("Wind Speed vs Rain for NEWPORT Station") +
  xlab("Wind Speed") +  ylab ("Rain")

Showing the Temperature and the Rain for New_Port

# Showing the Area and the Contour
ggplot(new_port, aes(x=temp, y=rain) ) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon", colour="white")+
  ggtitle ("Temperature vs R-Humiditity levels of NEWPORT Station") +
  xlab("Temperature") +  ylab ("Rhum")

Weather-Events

Joining the data sets to perform analysis

data_set_prep <- full_join( os, stations, by = "station")
data_set <- rename(data_set_prep, station_county = county.y , station_height = height.y , station_latitude = latitude.y, station_longitude = longitude.y )
head(data_set)
##   station year month day hour                date rain temp rhum    msl wdsp
## 1 ATHENRY 2017     1   1    0 2017-01-01 00:00:00  0.0  5.2   89 1021.9    8
## 2 ATHENRY 2017     1   1    1 2017-01-01 01:00:00  0.0  4.7   89 1022.0    9
## 3 ATHENRY 2017     1   1    2 2017-01-01 02:00:00  0.0  4.2   90 1022.1    8
## 4 ATHENRY 2017     1   1    3 2017-01-01 03:00:00  0.1  3.5   87 1022.5    9
## 5 ATHENRY 2017     1   1    4 2017-01-01 04:00:00  0.1  3.2   89 1022.7    8
## 6 ATHENRY 2017     1   1    5 2017-01-01 05:00:00  0.0  2.1   91 1023.3    8
##   wddir county.x height.x latitude.x longitude.x station_county station_height
## 1   320   Galway       40     53.289      -8.786         Galway             40
## 2   320   Galway       40     53.289      -8.786         Galway             40
## 3   320   Galway       40     53.289      -8.786         Galway             40
## 4   330   Galway       40     53.289      -8.786         Galway             40
## 5   330   Galway       40     53.289      -8.786         Galway             40
## 6   330   Galway       40     53.289      -8.786         Galway             40
##   station_latitude station_longitude
## 1           53.289            -8.786
## 2           53.289            -8.786
## 3           53.289            -8.786
## 4           53.289            -8.786
## 5           53.289            -8.786
## 6           53.289            -8.786

Weather Events

Earth is a dynamic planet that changes daily. Weather patterns and events are a tremendous part of that change. While these patterns and events are necessary for our planet to continue to be life-sustaining, they can also cause substantial damage and sometimes cost billions of dollars in repair and rescue efforts.

Weather phenomena can be defined as natural events that occur as a result of one or a combination of the water cycle, pressure systems and the Coriolis effect. They often involve or are related to precipitation, wind or heat.

One such significant weather event found in 2017 was Storm Ophelia.

Coldest and Warmest County in Ireland

c_temp1 <- aggregate(temp ~ station_county, data = data_set, mean)
c_temp1
##    station_county      temp
## 1          Carlow 10.402158
## 2           Cavan  9.749075
## 3           Clare 10.851427
## 4            Cork 10.757840
## 5         Donegal 10.295639
## 6          Dublin 10.273809
## 7          Galway 10.467882
## 8           Kerry 11.313459
## 9            Mayo  9.967186
## 10          Meath  9.897728
## 11      Roscommon  9.826450
## 12          Sligo  9.753607
## 13      Tipperary 10.034269
## 14      Westmeath  9.660422
## 15        Wexford 10.517272
# Westmeath is coldest county - when average temperature is considered
# Kerry is the warmest county - when average temperature is considered

In 2017, the record high and record low temperature was noticed in which respective county:

Record_low <- aggregate(temp ~ station_county, data = data_set, min)
Record_low
##    station_county temp
## 1          Carlow -4.2
## 2           Cavan -4.4
## 3           Clare -3.3
## 4            Cork -5.6
## 5         Donegal -2.2
## 6          Dublin -6.2
## 7          Galway -3.9
## 8           Kerry -1.9
## 9            Mayo -3.0
## 10          Meath -5.0
## 11      Roscommon -5.0
## 12          Sligo -5.1
## 13      Tipperary -6.0
## 14      Westmeath -4.9
## 15        Wexford -1.7
# Record low temperature was in Dublin County - considering the average temperature
Record_high <- aggregate(temp ~ station_county, data = data_set, max)
Record_high
##    station_county temp
## 1          Carlow 25.8
## 2           Cavan 24.9
## 3           Clare 25.5
## 4            Cork 26.6
## 5         Donegal 25.5
## 6          Dublin 28.3
## 7          Galway 24.6
## 8           Kerry 25.5
## 9            Mayo 24.4
## 10          Meath 26.5
## 11      Roscommon 25.7
## 12          Sligo 24.6
## 13      Tipperary 24.5
## 14      Westmeath 25.4
## 15        Wexford 22.8
# Record high temperature was in Dublin County - considering the average temperature

Heat map view of the County-Temperature-Monthly

ggplot(data_set, aes(x = month, y = station_county, fill = temp)) +
  geom_tile() + scale_fill_gradient(low="blue", high="red") +
  ggtitle ("Temperature wise station_Counties for each month - Tile ") +
  xlab("Month") +  ylab ("Station_County")

Storm Ophelia

On October 16th 2017, Storm Ophelia landed in Ireland. We will analyse this data.

Ophelia_storm <- os %>%
  filter(month==10, day==16)
head(Ophelia_storm)
##   station year month day hour                date rain temp rhum    msl wdsp
## 1 ATHENRY 2017    10  16    0 2017-10-16 00:00:00  0.4  9.9   95 1010.7    7
## 2 ATHENRY 2017    10  16    1 2017-10-16 01:00:00  0.3  9.9   95 1010.3    7
## 3 ATHENRY 2017    10  16    2 2017-10-16 02:00:00  0.4  9.9   95 1008.8    8
## 4 ATHENRY 2017    10  16    3 2017-10-16 03:00:00  0.0  9.8   95 1005.9    9
## 5 ATHENRY 2017    10  16    4 2017-10-16 04:00:00  0.5  9.9   95 1006.0    9
## 6 ATHENRY 2017    10  16    5 2017-10-16 05:00:00  0.1 10.5   96 1002.1   12
##   wddir county height latitude longitude
## 1    20 Galway     40   53.289    -8.786
## 2    10 Galway     40   53.289    -8.786
## 3    10 Galway     40   53.289    -8.786
## 4    20 Galway     40   53.289    -8.786
## 5    40 Galway     40   53.289    -8.786
## 6    40 Galway     40   53.289    -8.786
Record_high_temp_ophelia <- aggregate(temp ~ county, data = Ophelia_storm, max)
Record_high_temp_ophelia
##       county temp
## 1     Carlow 17.4
## 2      Cavan 16.1
## 3      Clare 16.5
## 4       Cork 17.3
## 5    Donegal 16.2
## 6     Dublin 17.3
## 7     Galway 16.6
## 8      Kerry 18.4
## 9       Mayo 15.4
## 10     Meath 17.5
## 11 Roscommon 16.2
## 12     Sligo 15.5
## 13 Tipperary 16.9
## 14 Westmeath 16.0
## 15   Wexford 16.9
#Highest temp was recorded in Kerry
Record_high_rhum_ophelia <- aggregate(rhum ~ county, data = Ophelia_storm, max)
Record_high_rhum_ophelia
##       county rhum
## 1     Carlow   97
## 2      Cavan   98
## 3      Clare   99
## 4       Cork  100
## 5    Donegal   95
## 6     Dublin  100
## 7     Galway   96
## 8      Kerry   95
## 9       Mayo   99
## 10     Meath   99
## 11 Roscommon   98
## 12     Sligo   96
## 13 Tipperary   98
## 14 Westmeath   98
## 15   Wexford   99
#Highest Hum was recorded in Dublin and Cork
Record_high_msl_ophelia <- aggregate(msl ~ county, data = Ophelia_storm, max)
Record_high_msl_ophelia
##       county    msl
## 1     Carlow 1011.1
## 2      Cavan 1011.9
## 3      Clare 1009.5
## 4       Cork 1013.4
## 5    Donegal 1013.5
## 6     Dublin 1011.8
## 7     Galway 1010.7
## 8      Kerry 1012.6
## 9       Mayo 1013.2
## 10     Meath 1011.1
## 11 Roscommon 1011.3
## 12     Sligo 1012.7
## 13 Tipperary 1010.2
## 14 Westmeath 1011.5
## 15   Wexford 1012.0
#Highest MSL was recorded in Donegal
Record_high_rain_ophelia <- aggregate(rain ~ county, data = Ophelia_storm, max)
Record_high_rain_ophelia
##       county rain
## 1     Carlow  0.6
## 2      Cavan  2.0
## 3      Clare  1.0
## 4       Cork  3.1
## 5    Donegal  2.4
## 6     Dublin  0.5
## 7     Galway  8.4
## 8      Kerry  9.8
## 9       Mayo  6.2
## 10     Meath  0.4
## 11 Roscommon  1.6
## 12     Sligo  2.0
## 13 Tipperary  1.5
## 14 Westmeath  1.6
## 15   Wexford  0.3
#Highest Rain was recorded in Kerry

One of the factors influenced by storms is Atmospheric Pressure. Now, we will try to find out the stations that has the lowest atmospheric pressure.

lowest_atm_stations <- Ophelia_storm %>% arrange(msl) %>% 
  slice(1:2) %>% 
  pull(station) %>%
  unique()
lowest_atm_stations
## [1] "VALENTIA OBSERVATORY" "MACE HEAD"

Plotting it in graph

ggplot(filter(Ophelia_storm,station %in% lowest_atm_stations),aes(x=date,y=msl,colour=station))+
  geom_bar(stat="identity")+
  ggtitle ("Bar Graph for Ophelia Storm and Lowest Atmospheric Stations") +
  xlab("Date") +  ylab ("Minimum Sea Level")

Another factor influenced by a storm is Wind speed. Now, we wil try to find the stations where the mean hourly wind speed was highest on the given date.

highest_wdsp_station<- Ophelia_storm %>% arrange(desc(wdsp)) %>% 
  slice(1:3) %>% 
  pull(station) %>%
  unique()
highest_wdsp_station
## [1] "ROCHES POINT"  "SherkinIsland"
ggplot(filter(Ophelia_storm,station %in% highest_wdsp_station),aes(x=date,y=wdsp,colour=station)) +
 geom_area()

Energy Data

The energy demand per month can be viewed

ggplot(eirgrid17,aes(x=date,y=IEDemand, colour=month))+geom_point()+geom_line()

The wind power generated for a month can be viewed

ggplot(eirgrid17,aes(x=date,y=IEWindGeneration))+geom_point()+geom_line()

We can join the observations and eirgrid17 data sets to perform further analysis to find relations between wind power and energy generated and other relevant insights.

new_dataset <- full_join(eirgrid17, os, by = c("year", "month", "day", "hour"))
new_dataset
## # A tibble: 876,025 × 27
##     year month   day  hour minute date.x              NIGeneration NIDemand
##    <dbl> <dbl> <int> <int>  <int> <dttm>                     <dbl>    <dbl>
##  1  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
##  2  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
##  3  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
##  4  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
##  5  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
##  6  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
##  7  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
##  8  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
##  9  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
## 10  2017     1     1     0      0 2017-01-01 00:00:00         889.     776.
## # … with 876,015 more rows, and 19 more variables: NIWindAvailability <dbl>,
## #   NIWindGeneration <dbl>, IEGeneration <dbl>, IEDemand <dbl>,
## #   IEWindAvailability <dbl>, IEWindGeneration <dbl>, SNSP <chr>,
## #   station <chr>, date.y <dttm>, rain <dbl>, temp <dbl>, rhum <dbl>,
## #   msl <dbl>, wdsp <dbl>, wddir <dbl>, county <chr>, height <dbl>,
## #   latitude <dbl>, longitude <dbl>

Energy generated with respect to the wind speed

energy_wind <- aggregate(IEGeneration ~ wdsp, data = new_dataset, sum) 
ggplot(energy_wind,aes(IEGeneration, wdsp)) + geom_point() + geom_line()

Looking for relationship between average hourly wind speed and average wind power generated.

months <- new_dataset %>% 
                        group_by(county,wdsp,year,month,day,hour) %>%
                        summarise(AvrEnergyHourWind=mean(IEWindGeneration))
## `summarise()` has grouped output by 'county', 'wdsp', 'year', 'month', 'day'. You can override using the `.groups` argument.
months
## # A tibble: 207,761 × 7
## # Groups:   county, wdsp, year, month, day [57,107]
##    county  wdsp  year month   day  hour AvrEnergyHourWind
##    <chr>  <dbl> <dbl> <dbl> <int> <int>             <dbl>
##  1 Carlow     0  2017     4    18     4              44.9
##  2 Carlow     0  2017     8    31    23             168. 
##  3 Carlow     0  2017     9    25     6              28.8
##  4 Carlow     0  2017    11     1    21             114. 
##  5 Carlow     1  2017     1     2    23             195. 
##  6 Carlow     1  2017     1     3     0             180. 
##  7 Carlow     1  2017     1     4    23             296. 
##  8 Carlow     1  2017     1     5     0             337. 
##  9 Carlow     1  2017     1     5     1             411. 
## 10 Carlow     1  2017     1     5     3             558. 
## # … with 207,751 more rows
ggplot(months,aes(x=wdsp,y=AvrEnergyHourWind,colour=county))+geom_point()+geom_jitter()+geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 23 rows containing non-finite values (stat_smooth).
## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 23 rows containing missing values (geom_point).

IE Wind Generation in each county with respective wind speed

ggplot(new_dataset,aes(x=wdsp,y=IEWindGeneration,colour=county))+geom_point()+geom_jitter()+geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 25 rows containing non-finite values (stat_smooth).
## Warning: Removed 25 rows containing missing values (geom_point).

## Warning: Removed 25 rows containing missing values (geom_point).

The energy demand per hour/day/month in 2017

energy_demand <- eirgrid17 %>% 
            mutate(DoW=wday(date,label = T),
                   Month=month(date,label=T)) %>%
            group_by(DoW,hour,Month) %>%
            summarise(Max_Energy_Demand=max(IEDemand)) %>%
            ungroup() %>%
            mutate(DoW=factor(DoW,
                              levels = c("Mon","Tue", "Wed","Thu","Fri","Sat","Sun")))
## `summarise()` has grouped output by 'DoW', 'hour'. You can override using the `.groups` argument.

The energy demand for each month in 2017

ggplot(energy_demand, aes(hour, DoW)) +
 geom_tile(aes(fill = Max_Energy_Demand))+scale_fill_gradientn(colours = c("lightblue", "red"))+facet_wrap(~Month,ncol=4)

The energy demand during the Ophelia storm

Ophelia_storm_nd <- new_dataset %>%
  filter(month==10, day==16)
Ophelia_storm_nd
## # A tibble: 2,400 × 27
##     year month   day  hour minute date.x              NIGeneration NIDemand
##    <dbl> <dbl> <int> <int>  <int> <dttm>                     <dbl>    <dbl>
##  1  2017    10    16     0      0 2017-10-16 00:00:00         555.     684.
##  2  2017    10    16     0      0 2017-10-16 00:00:00         555.     684.
##  3  2017    10    16     0      0 2017-10-16 00:00:00         555.     684.
##  4  2017    10    16     0      0 2017-10-16 00:00:00         555.     684.
##  5  2017    10    16     0      0 2017-10-16 00:00:00         555.     684.
##  6  2017    10    16     0      0 2017-10-16 00:00:00         555.     684.
##  7  2017    10    16     0      0 2017-10-16 00:00:00         555.     684.
##  8  2017    10    16     0      0 2017-10-16 00:00:00         555.     684.
##  9  2017    10    16     0      0 2017-10-16 00:00:00         555.     684.
## 10  2017    10    16     0      0 2017-10-16 00:00:00         555.     684.
## # … with 2,390 more rows, and 19 more variables: NIWindAvailability <dbl>,
## #   NIWindGeneration <dbl>, IEGeneration <dbl>, IEDemand <dbl>,
## #   IEWindAvailability <dbl>, IEWindGeneration <dbl>, SNSP <chr>,
## #   station <chr>, date.y <dttm>, rain <dbl>, temp <dbl>, rhum <dbl>,
## #   msl <dbl>, wdsp <dbl>, wddir <dbl>, county <chr>, height <dbl>,
## #   latitude <dbl>, longitude <dbl>
ggplot(Ophelia_storm_nd,aes(hour, IEDemand)) + geom_point() + geom_line()

Summary

Exploratory data analysis was performed on the three data sets and following conclusions were made.

We have used all three datasets avaialble in the ‘aimsir17’ package, after performing certain data cleaming and data preparations we looked for the rainy month and the mosst wettiest station. Similarly we explored hourly and monthly temperature analysis by which we get to know the increase in temperature in certain period of the year. Likewise, the Humidity Analysis depict us how the rain and Humidity levels are interconnected. On the other hand, the Station-wise analysis represents the total rainfall as per each station, here, we have worked on the New-Port station here, as it is the most rainy station overall. In this analysis we discovered the following explorations.

  • From our observation, we can conclude that humidity and rain are directly proportional and on the other hand humidity and temperature are inversely proportional.
  • Wettest months in Ireland are usually December and January but in 2017, by using our data set we can interpret that September has recorded the highest rainfall.
  • The west coast of Ireland receives more rainfall than the east coast and this is due to the wind blows from the southwest breaking on the mountains of the west.
  • From the data sets we have observed that energy demand rises when the temperature falls.

Likewise, Weather events analysis projects certain aspects about overall significant events happened throughout the year over each Station_County, which includes data of the Ophelia storm happened during October month. Finally, the Energy outlook represents the overall energy demand and generation in the Republic of Ireland through out the year, this shows us the pecuilar details i.e. low temperature demands high energy.

By working on the ‘aimsir17’ dataset we gained knowledge about how complex and detailed the atmospheric data can become. After working on it we grasped and enhanced our knowledge regarding the natural phenomenas.

After the analysis of the data we found:

  • As the temperature falls, the energy demand rises. More energy resources will be needed for heating purposes, mainly in the winter season, more energy (in particular, electricity) will be needed to run heating devices.
  • There is a strong correlation between rain and humidity parameters shown in the data set.
  • Temperature and humidity are inversely proportionate.

Limitations and Challenges

The challenges we faced were more in terms of aggregating the data sets based on different parameters and making sure the joined data set is producing relevant outputs. This report lacks the forecasting of weather as it requires expertise to analyze to forecast the weather some level of expertise is required to analyze the data that comes through it.

The limitations of the data sets were * Lack of Previous Data for Comparisons: There are no data on the previous years weather in the study which limits the use of the data for the purposes of predicting or forecasting. It is therefore not possible to estimate reliably the next occurrences of rain, cold or warm weather. * The estimates can be wrong: The estimates obtained from weather radar are not 100 percent accurate. This means that the data may be wrong in some cases and this may impact the fial decision making. * More interference: Radar technology experiences interference from various aspects of the weather including water, wind, and so on. This may affect the quality of the data and hence the results of the analysis. * Cannot detect fog: Weather radar has the limitation of not being able to detect fog.

Area of responsibility

  • I, Sardar yousaf saleem, had primary responsibility for the data cleaning, merging and humidity analysis.
  • I, Madhushree Saravanan, had primary responsibility for the temperature analysis , ranifall analysis and station analysis.
  • I, Harsha Teja N, had primary responsibility for the weather events, subset of station analysis and energy data analysis.
  • I, Priyank, had primary responsibility for the data analysis , energy data analysis, subsets of summary and conclusions.